Joint Frequency Domain and Reconstructured Phase Space Derived Features for Speech Recognition

نویسندگان

  • Andrew C. Lindgren
  • Michael T. Johnson
  • Richard J. Povinelli
چکیده

A novel method for speech recognition is presented, utilizing nonlinear/chaotic signal processing techniques to extract timedomain based, reconstructed phase space derived features. By exploiting the theoretical results derived in nonlinear dynamics, a distinct signal processing space called a reconstructed phase space can be generated where salient features (the natural distribution and trajectory of the attractor) can be extracted for speech recognition. To discover the discriminatory strength of these reconstructed phase space derived features, isolated phoneme classification experiments are executed using the TIMIT corpus and are compared to a baseline classifier that uses Mel frequency cepstral coefficient features (MFCCs). The results demonstrate that reconstructed phase space derived features contain substantial discriminatory power, and when the two feature sets are combined, improvement is made over the baseline. This result suggests that the features extracted using these nonlinear techniques contain different discriminatory information than the features extracted from linear approaches alone. Because they attack the speech recognition problem in a radically different manner, these reconstructed phase space derived features are an attractive research opportunity for improving speech recognition accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

Speech Recognition Using Time Domain Features from Phase Space Reconstructions

A speech recognition system implements the task of automatically transcribing speech into text. As computer power has advanced and sophisticated tools have become available, there has been significant progress in this field. But a huge gap still exists between the performance of the Automatic Speech Recognition (ASR) systems and human listeners. In this thesis, a novel signal analysis technique...

متن کامل

Speech recognition using reconstructed phase space features

This paper presents a novel method for speech recognition by utilizing nonlinear/chaotic signal processing techniques to extract time-domain based phase space features. By exploiting the theoretical results derived in nonlinear dynamics, a processing space called a reconstructed phase space can be generated where a salient model (the natural distribution of the attractor) can be extracted for s...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

The modified group delay feature: a new spectral representation of speech

Automatic recognition of speech by machines begins with extraction of meaningful features from the speech signal. Conventional features like the MFCC are derived from the Fourier transform magnitude spectrum, while totally ignoring the phase spectrum. The importance of the Modified group delay feature (MODGDF) derived from the Fourier transform phase spectrum for speaker and phoneme recognition...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003